Consistent Regions: Guaranteed Tuple Processing in IBM Streams

نویسندگان

  • Gabriela Jacques-Silva
  • Fang Zheng
  • Daniel Debrunner
  • Kun-Lung Wu
  • Victor Dogaru
  • Eric Johnson
  • Michael Spicer
  • Ahmet Erdem Sariyüce
چکیده

Guaranteed tuple processing has become critically important for many streaming applications. This paper describes how we enabled IBM Streams, an enterprise-grade stream processing system, to provide data processing guarantees. Our solution goes from language-level abstractions to a runtime protocol. As a result, with a couple of simple annotations at the source code level, IBM Streams developers can define consistent regions, allowing any subgraph of their streaming application to achieve guaranteed tuple processing. At runtime, a consistent region periodically executes a variation of the Chandy-Lamport snapshot algorithm to establish a consistent global state for that region. The coupling of consistent states with data replay enables guaranteed tuple processing.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semantic Query Optimization in an Automata-Algebra Combined XQuery Engine over XML Streams

Our Raindrop framework [6, 9] aims at tackling challenges of stream processing that are particular to XML. In contrast to the tuple-based or object-based data streams, XML streams are usually modeled as a sequence of primitive tokens, such as a start tag, an end tag or a PCDATA item. Unlike a self-contained tuple or object whose semantics are completely determined by its own values, a token lac...

متن کامل

Language Runtime and Optimizations in IBM Streams

Stream processing is important for continuously transforming and analyzing the deluge of data that has revolutionized our world. Given the diversity of application domains, streaming applications must be both easy to write and performant. Both goals can be accomplished by high-level programming languages. Dedicated language syntax helps express stream programs clearly and concisely, whereas the...

متن کامل

ارائه روشی پویا جهت پاسخ به پرس‌وجوهای پیوسته تجمّعی اقتضایی

Data Streams are infinite, fast, time-stamp data elements which are received explosively. Generally, these elements need to be processed in an online, real-time way. So, algorithms to process data streams and answer queries on these streams are mostly one-pass. The execution of such algorithms has some challenges such as memory limitation, scheduling, and accuracy of answers. They will be more ...

متن کامل

iJoin: Importance-Aware Join Approximation over Data Streams

We consider approximate join processing over data streams when memory limitations cause incoming tuples to overflow the available space, precluding exact processing. Selective eviction of tuples (loadshedding) is needed, but is challenging since data distributions and arrival rates are unknown a priori. Also, in many real-world applications such as for the stock market and sensor-data, differen...

متن کامل

PLinda 2.0: A Transactional/Checkpointing Approach to Fault Tolerant Linda

Robust parallel computation in Linda requires both tuple space and processes to be resilient to failure. In this paper, we present PLinda 2.0, set of extensions to Linda to support robust parallel computation on loosely coupled processors communicating over a network. The principal extensions of PLinda 2.0 to Linda are transaction mechanisms for reliable tuple space and process-private logging ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • PVLDB

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2016